How do you bootstrap???

Using bootstrap resamples to generate a confidence interval

From your original sample, resample with replacement the same number of times as your original sample.

This is your bootstrap resample.

Repeat this process many, many times.

Calculate a numerical summary (e.g., mean, median) for each bootstrap resample.

These are your bootstrap statistics

Bootstrap Distribution

definition: a distribution of the bootstrap statistics from every bootstrap resample


Displays the variability in the statistic that could have happened with repeated sampling.

Approximates the true sampling distribution!

Penguins!

Statistic: \(\beta_1\)

The relationship between penguin’s bill length and body mass for all penguins in the Palmer Archipelago

Generating a bootstrap resample

Step 1: specify() your response and explanatory variables

Step 2: generate() bootstrap resamples

Step 3: calculate() the statistic of interest

Declare your variables!

penguins %>% 
  specify(response = bill_length_mm, explanatory = body_mass_g)

Generate your resamples!

penguins %>% 
  specify(response = bill_length_mm, 
          explanatory = body_mass_g) %>% 
  generate(reps = 500, type = "bootstrap")


reps – the number of resamples you want to generate

"bootstrap" – the method that should be used to generate the new samples

Calculate your statistics!

penguins %>% 
  specify(response = bill_length_mm, 
          explanatory = body_mass_g) %>% 
  generate(reps = 500, 
           type = "bootstrap") %>% 
  calculate(stat = "slope")


"slope" – the statistic of interest

The final product

visualize(bootstrap) + 
  labs(title = "Bootstrap Distribution of 500 reps", 
       x = "Slope Statistic")

A plausible range of values for: \(\beta_1\)

The 95% confidence interval is…

get_confidence_interval(bootstrap, 
                        level = 0.95, 
                        type = "percentile")


Lower Bound Upper Bound
0.00354 0.00452